20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij =

Size: px

Start display at page:

Download "20.1. Balanced One-Way Classification Cell means parametrization: ε 1. ε I. + ˆɛ 2 ij ="

Opal Hart
5 years ago
Views:

1 20. ONE-WAY ANALYSIS OF VARIANCE Balanced One-Way Classification Cell means parametrization: Y ij = µ i + ε ij, i = 1,..., I; j = 1,..., J, ε ij N(0, σ 2 ), In matrix form, Y = Xβ + ε, or 1 Y J 0 J 0 J 1. 0 = J 1 J 0 J Y I 0 J 0 J 1 J independent. µ 1. µ I + ε 1. ε I, where Y i = (Y i1,..., Y ij ), ε i = (ε i1,..., ε ij ). The least-squares estimates are: ˆµ i = Ȳi. So the RSS is RSS = i ˆɛ 2 ij = j i (Y ij Ȳi) 2. j

2 20. ONE-WAY ANALYSIS OF VARIANCE 2 Alternative Parametrization of the Model: Y ij = µ + α i + ε ij. µ is interpreted as the overall mean ( grand mean ), α i is interpreted as the difference between the mean of group i and the overall mean. Are the α i estimable with this parametrization? No. Note that α i is like µ i µ from the previous parameterization. Therefore, I α i = 0 is a natural constraint because α i = (µ i µ) = 0 is a natural constraint. This is an identifiability constraint for the model. The unique least-squares estimates satisfying α i = 0 can be derived by the following trickery: Rewrite ε ij as: ε ij = ε + ( ε i ε ) + (ε ij ε i ). i j ɛ ij. j ɛ ij and ɛ = 1 IJ where ɛ i = 1 J Square both sides of this expression and sum over all i and j. The cross-product terms are 0.

3 20. ONE-WAY ANALYSIS OF VARIANCE 3 J J ε 2 ij = ε 2 + J ( ε i ε ) 2 + Now make some substitutions: J (ε ij ε i ) 2. ε ij = Y ij µ α i ε i = Ȳi µ α i ε = Ȳ µ 1 I αi = Ȳ µ J (Y ij µ α i ) 2 = J (Ȳ µ) 2 + J (Ȳi Ȳ α i ) 2 + J (Y ij Ȳi ) 2. What is the left hand side? RSS This is minimized by the least-squares estimates: ˆµ = Ȳ and ˆα i = Ȳi Ȳ.

4 20. ONE-WAY ANALYSIS OF VARIANCE F-test for Group Differences Test H : µ 1 = µ 2 = = µ I, i.e., H : µ 1. µ I = 0. The F statistic is F = (RSS H RSS)/(I 1), RSS/[IJ I] where RSS H = (Y ij Ȳ ) 2. i j We have a similar algebraic identity: (Y ij Ȳ ) 2 = (Y ij Ȳi ) 2 + i j i j i j = (Y ij Ȳi ) 2 + J i j i RSS H = RSS+J i (Ȳi Ȳ ) 2 (Ȳi Ȳ ) 2 (Ȳi Ȳ ) 2 The F statistic becomes F = J i (Ȳi Ȳ ) 2 /(I 1) j (Y ij Ȳi ) 2 /[IJ I] F I 1,I(J 1). i

5 20. ONE-WAY ANALYSIS OF VARIANCE 5 The results of a one-way ANOVA are often displayed in an ANOVA table: Source df SS MS F Groups I 1 SS = J i (Ȳi Ȳ ) 2 MS = SS A I 1 Error I(J 1) SS E = i j (Y ij Ȳi ) 2 MS E = SS E I(J 1) MS A MS E Total IJ 1 i j (Y ij Ȳ ) 2

6 20. ONE-WAY ANALYSIS OF VARIANCE Example Compare the breaking strength of steel plates for 5 brands of cars: Country Brand Mean U.S. GM(1) µ 1 U.S. GM(2) µ 2 U.S. Ford µ 3 Japan Toyota µ 4 Japan Honda µ 5 Data are measurements on J = 4 samples per brand. Suppose we get Source df SS MS F p Groups /0.34= Error Total Using the F-test, do we reject H : µ 1 = µ 2 = = µ 5? Yes, we reject at the 0.05 and 0.01 level.

7 20. ONE-WAY ANALYSIS OF VARIANCE Orthogonal Contrasts Some more algebra: REG SS = (PY) (PY) = Y PY = Y X(X X) 1 X Y = Y X[(X X) 1 (X X)](X X) 1 X Y = ˆβ (X X)ˆβ = ˆβ A A 1 (X X)A 1 Aˆβ = (Aˆβ) [A 1 (X X)A 1 ](Aˆβ) = (Aˆβ) [A(X X) 1 A ] 1 (Aˆβ) This decomposition is general, but we now apply it to one-way ANOVA using a matrix A whose rows are pairwise orthogonal (so that A A is diagonal). With the cell-means parameterization, (X X) 1 = (1/J)I. Then REG SS = (Aˆβ) [A(X X) 1 A ] 1 (Aˆβ) = (Aˆβ) [AA /J] 1 (Aˆβ) = (a i ˆβ) 2 /[a i a i/j] In one-way ANOVA, we sometimes use orthogonal contrasts to further decompose the regression sums of squares.

8 20. ONE-WAY ANALYSIS OF VARIANCE Example Continued Consider the following four orthogonal contrasts of the cell means. (There are four degrees of freedom, since the fifth degree of freedom is for the overall mean). We partition the Groups Sum-Of-Squares into four smaller Sums-of-Squares corresponding to these four orthogonal contrasts: U.S. vs. Japanese: a 1 β = (1 3, 1 3, 1 3, 1 2, 1 2 )β GM vs. Ford: a 2 β = (1 2, 1, 1, 0, 0)β 2 GM(1) vs. GM(2): a 3β = (1, 1, 0, 0, 0)β Toyota vs. Honda: β = (0, 0, 0, 1, 1)β Note that a i a j = 0, i j. Source df SS MS F p Groups: U.S. vs. Japanese GM vs. Ford GM(1) vs. GM(2) Toyota vs. Honda Error a 4

9 20. ONE-WAY ANALYSIS OF VARIANCE Unbalanced Case Suppose there are different numbers of observations per group: Y ij = µ + α i + ε ij, i = 1,..., I; j = 1,..., J i. Most of the same tricks from before still work. Rewrite ε ij as: ε ij = ε + ( ε i ε ) + (ε ij ε i ). Square both sides of this expression and sum over all i and j. The cross-product terms are still 0. J i ε 2 ij = J i ε 2 + J i ( ε i ε ) 2 + Some substitutions are the same as before: ε ij = Y ij µ α i J i (ε ij ε i ) 2. ε i = Ȳi µ α i But what about ε? It is NOT the average of the ε i. ε = 1 n = µ + 1 n J i ε ij = 1 n J i ε i = 1 n J i (Ȳi µ α i ) J i (Ȳi α i ) = Ȳ µ 1 n J i α i To continue as we did before, the convenient identifiability constraint is now I J iα i = 0. So then ε = Ȳ µ and the rest works out as before.

10 20. ONE-WAY ANALYSIS OF VARIANCE 10 J i (Y ij µ α i ) 2 = + J i (Ȳ µ) 2 J i (Ȳi Ȳ α i ) 2 + J i (Y ij Ȳi ) 2. This shows that ˆµ = Ȳ and ˆα i = Ȳi Ȳ when we use the identifiability constraint I J iα i = 0. Then the F statistic for testing group differences is now F = i J i(ȳi Ȳ ) 2 /(I 1) j (Y ij Ȳi ) 2 /(n I) F I 1,n I. i Thus the numerator of the F-statistic is still the group sum of squares (but now appropriately weighted). Imbalance doesn t complicate things too much in one-way ANOVA. Not so for two-way ANOVA, etc.

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component